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ABSTRACT 



A system and method for acquiring, processing, and com- 
paring an image with a stored image to determine if a match 
exists. In particular, the system refines the image data 
associated with an object based on pre-stored color values, 
such as flesh tone color. The system includes a storage 
element for storing flesh tone colors of a plurality of people, 
and a defining stage for localizing a region of interest in the 
image. A combination stage combines the unrefined region 
of interest with one or more pre-stored flesh tone colors to 
refine the region of interest based on color. This flesh tone 
color matching ensures that at least a portion of the image 
corresponding to the unrefined region of interest having 
flesh tone color is incorporated into the refined region of 
interest. Hence, the system can localize the head, based on 
the flesh tone color of the skin of the face in a rapid manner. 
According to one practice, the refined region of interest is 
smaller than or about equal to the unrefined region of 
interest. 

58 Claims, 13 Drawing Sheets 
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REAL-TIME FACIAL RECOGNITION AND 
VERIFICATION SYSTEM 

BACKGROUND OF THE INVENTION 

The present invention relates to systems for identifying an 
individual, and in another case, verifying the individual's 
identity to perform subsequent tasks, such as allowing 
access to a secured facility or permit selected monetary 
transactions to occur. 

Modern identification and verification systems typically 
provide components that capture an image of a person, and 
then with associated circuitry and hardware, process the 
image and then compare the image with stored images, if 
desired. In a secured access environment, a positive match 
between the acquired image of the individual and a pre- 
stored image allows access to the facility. 

The capture and manipulation of image data with modem 
identification systems places an enormous processing bur- 
den on the system. Prior art systems have addressed this 
problem by using Principal Component Analysis (PCA) on 
image data to reduce the amount of data that needs to be 
stored to operate the system efficiently. An example of such 
a system is set forth in U.S. Pat. No, 5,164,992, the contents 
of which are hereby incorporated by reference. However, 
certain environmental standards need still be present to 
ensure the accuracy of the comparison between the newly 
acquired image of the pre-stored image. In particular, the 
individual is generally positioned at a certain location prior 
to capturing the image of the person. Additionally, the 
alignment of the body and face of the individual is controlled 
to some degree to ensure the accuracy of the comparison. 
Lighting effects and other optical parameters are addressed 
to further ensure accuracy. Once the individual is positioned 
at the selected location, the system then takes a snapshot of 
the person, and this still image is processed by the system to 
determine whether access is granted or denied. 

The foregoing system operation suffers from a real time 
cost that slows the overall performance of the system. 
Modem system applications require more rigorous determi- 
nations in terms of accuracy and time in order to minimize 
the inconvenience to people seeking access to the facility or 
attempting to perform a monetary transaction, such as at an 
automated teller machine (ATM). Typical time delays in 
order to properly position and capture an image of the 
person, and then compare the image with pre-stored images, 
is in the order of 3 to 5 seconds or even longer. 
Consequently, these near real-time systems are quickly 
becoming antiquated in today's fast paced and technology 
dependent society. There thus exists a need in the art to 
develop a real-time facial identification and verification 
system that in real-time acquires and processes images of the 
individual. 

Accordingly, an object of this invention is to provide a 
real-time identification and verification system. 

Another object of this invention is to provide an identi- 
fication system that simplifies the processing of the acquired 
image while concomitantly enhancing the accuracy of the 
system. 

Other general and more specific objects of the invention 
will in part be obvious and will in part appear from the 
drawings and description which follow. 

SUMMARY OF THE INVENTION 

The present invention provides systems and methods of a 
facial recognition system for acquiring, processing, and 
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comparing an image with a stored image to determine if a 
match exists. The facial recognition system determines the 
match in substantially real time. In particular, the present 
invention employs a motion detection stage, blob stage and 

s a color matching stage at the input to localize a region of 
interest (ROI) in the image. The ROI is then processed by 
the system to locate the head, and then the eyes, in the image 
by employing a series of templates, such as eigen templates. 
The system then thresholds the resultant eigenimage to 

10 determine if the acquired image matches a pre-stored image. 
This invention attains the foregoing and other objects with 
a system for refining an object within an image based on 
color. The system includes a storage element for storing 
flesh tone colors of a plurality of people, and a defining stage 

15 for localizing a region of interest in the image. Generally, the 
region is captured from a camera, and hence the ROI is from 
image data corresponding to real-time video. This ROI is 
generally unrefined in that the system processes the image to 
localize or refine image data corresponding to preferred 

20 ROI, such as a person's head. In this case, the unrefined 
region of interest includes flesh tone colors. A combination 
stage combines the unrefined region of interest with one or 
more pre-stored flesh tone colors to refine the region of 
interest based on the color. This flesh tone color matching 

25 ensures that at least a portion of the image corresponding to 
the unrefined region of interest having flesh tone color is 
incorporated into the refined region of interest. Hence, the 
system can localize the head, based on the flesh tone color 
of the skin of the face in a rapid manner. According to one 

30 practice, the refined region of interest is smaller than or 
about equal to the unrefined region of interest. 

According to one aspect, the system includes a motion 
detector for detecting motion of the image within a field of 
view, and the flesh tone colors are stored in any suitable 

35 storage element, such as a look-up -table. The flesh tone 
colors are compiled by generating a color histogram from a 
plurality of reference people. The resultant histogram is 
representative of the distribution of colors that constitute 
flesh tone color. 

40 According to another aspect, a blob stage is also 
employed for connecting together selected pixels of the 
object in the image to form a selected number of blobs. This 
stage in connection with the motion detector rapidly and 

45 with minimal overhead cost localize a ROI within the image. 
According to another aspect, the system when generating 
the flesh tone colors employs a first histogram stage for 
sampling the flesh tone colors of the reference people to 
generate a first flesh tone color histogram. The color is then 

5Q transformed into ST color space. The system can also 
optionally employ a second histogram stage for generating 
a second color histogram not associated with the face within 
the image, and which is also transformed into ST color 
space. 

55 According to still another aspect, the system comprises an 
erosion operation to the image data corresponding, for 
example, to a face, to separate pixels corresponding to hair 
from pixels corresponding to face, as well as to reduce the 
size of an object within the image, thereby reducing the size 

60 of the unrefined region of interest. 

According to yet another aspect, the system also performs 
a dilation operation to expand one of the region of interests 
to obtain the object (e.g., face or eyes) within the image. 
The present invention also contemplates* a facial recog- 

65 nition and identification system for identifying an object in 
an image. The system includes an image acquisition element 
for acquiring the image, a defining stage for defining an 
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unrefined region of interest corresponding to the object in FIGS. 10 and lOAare flow-chart diagrams illustrating the 

the image, and optionally a combination stage for combining acquisition and determination of a selected region of interest 

the unrefined region of interest with pre-stored flesh tone by the facial recognition system according to the teachings 

colors to refine the region of interest to ensure at least a of the present invention. 

portion of the image corresponding to the unrefined region 5 FIG. 11 is a more detailed schematic block diagram 

of interest includes flesh tone color. The refined region of depiction of the image manipulation stage of FIG. 1 in 

interest can be smaller than or about equal to the unrefined accordance with the teachings of the present invention, 

region of interest. FIG. 12 is a flow-chart diagram illustrating the discrimi- 

According to another aspect, the system also includes a nation performed by the real time facial recognition system 

detection module for detecting a feature of the object. io 0 fFIG. 1 according to the teachings of the present invention. 

According to another aspect, the combination stage com- 
bines a blobs with one or more of flesh tone colors to DESCRIPTION OF ILLUSTRATED 
develop or generate the ROI. EMBODIMENTS 

According to another aspect, the system further includes 5 The present invention relates to an image identification 

a compression module for generating a set of eigenvectors of and verification system that can be used in a multitude of 

a training set of people in the multi-dimensional image environments, including access control facilities, monitory 

space, and a projection stage for projecting the feature onto transaction sites and other secured installations. The present 

the multi-dimensional image space to generate a weighted invention has wide applicability to a number of different 

vector that represents the person's feature corresponding to fields and installations, but for purposes of clarity will be 

the ROI. A discrimination stage compares the weighted discussed below in connection with an access control veri- 

vector corresponding to the feature with a pre-stored vector fication and identification system. The following use of this 

to determine whether there is a match, example is not to be construed in a limiting sense. 

FIG. 1 illustrates a facial identification and verification 

BRIEF DESCRIPTION OF THE DRAWINGS ^ system 20 according to the teachings of the present inven- 

Hie foregoing and other objects, features and advantages J** illustrated system 20 includes a multitude of 

oftheinventionwillbeapparentfromthefollowingdescrip- senaUy connected stages. These stages include an image 

tion and apparent from the accompanying drawings, in acquisition sta 6 e 22^ frame grabber stage 26, a head find 

which like reference characters refer to the same parts sta * e 2 *> ^ find st *S e 30 > and an ima 8 e manipulation 

throughout the different views. The drawings illustrate prin- 30 ^ 34 - ™ cse sta S es to acquire an image of an 

ciples of the invention and, although not to scale, show such as / P^on and digitize*. The head and eyes 

relative dimensions are located within the image. The image manipulation 

n „ . . . * . i « . j. _ . . „ . , stage 34 places the image in suitable condition for compres- 

FIG. 1 is a schematic block diagram of a real time facial &ion and sub nt comparison ^th pre-stored image 

recognition system according to the teachings of the present ^ identification information . Specifically, the output of the 

mven ion. image manipulation stage 34 serves as the input to a com- 

F1G. 2 is a schematic block diagram of the image acqui- pression stage 36, which can be a principal component 

sition and detection portions of the real time facial recog- analysis compression stage. This stage produces eigenvec- 

nition system of FIG. 1 in accordance with the teachings of tors fr om a reference set of images projected into a multi- 

the present invention. 4Q dimensional image space. The vectors are then used to 

FIG. 3 is more detailed schematic depiction of the detec- characterize the acquired image. The compression stage 36 

tion stage of FIG. 2, which includes a color matching stage in turn generates an output signal which serves as an input 

in accord with the teachings of the present invention. to a discrimination stage 38, which determines whether the 

FIG. 4A is another detailed schematic block diagram acquired image matches a pre-stored image, 

depiction of the detection stage illustrating the erosion and 45 FIG. 2 illustrates in further detail the front end portion of 

dilation operations performed on the image according to the the system 20. The image acquisition stage 22 includes a 

teachings of the present invention. video camera 40, which produces an S-video output stream 

FIG. 4B is a schematic illustrative depiction of the manner 42 at conventional frame rates. Those of ordinary skill will 

in which color values stored in the color table are combined appreciate that the video camera used herein may be a 

with a region of interest generated by the detection stage of 50 monochrome camera, a full color camera, or a camera that 

FIG. 3 in accordance with the teachings of the present ^ sensitive to non-visible portions of the spectrum. Those 

invention. skilled in the art will also appreciate that the image acqui- 

FIG. 5 is a schematic depiction of the scaling and low s f™ 22 ma y ^ * variety of different types 

resolution eigenhead feature of the present invention. of vli f° cameras and f 8 en f^ an y s ? uble ^chanism for 

„_ , . . , ., . , ....... . . 55 providing an image of a subject may be used as the image 

FIG. 6 is a more detailed schematic block diagram depic- acquisition sUge 22. The image acquisition stage 22 may, 

tion ot the real time facial recognition system of FIG. 1 alternatively, be an interface to a storage device, such as a 

according to the teachings of the present invention. magnetic storage medium 0f otfaer comporjents for sloring 

FIGS. 7A through 7C illustrate various embodiments of a images or image data. As used herein, "image dater" refers 
center-weighted windowing functions employed by the 60 to data such as luminance values, chrominance values, grey 
facial recognition system according to the teachings of the SC ale and other data associated with, defining or character- 
present invention. ixing ^ [mage. 

FIG. 8 is a block diagram depiction of the fast fourier The video output stream 42 is received by a frame grabber 

transform stage for generating a correlation map. 26, which serves to latch frames of the S-video input stream 

FIG. 9 is a flow-chart diagram illustrating the generation 65 and to convert the S-video analog signal into a digitized 

of the eigenfaces by employing a dot product in accordance output signal, which is then processed by the remainder of 

with the teachings of the present invention. the system 20. It is known that conventional video cameras 
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produce an analog video output stream of 30 frames per toring transaction environments where an individual seeking 

second, and thus the frame grabber 26 is conventionally access to, for example, an ATM machine, would have to 

configured to capture and digitize image frames as this video approach the ATM machine, and thus create motion within 

rate. The video camera need not be limited to S-video, and the field of view. 

can include near IR or IR mode, which utilizes RS 170 video . s According to one practice, once the detection stage 50 has 

The frame grabber 26 produces a digitized frame output detected motion and determines that the motion of the object 

signal 44 which is operatively communicated with multiple within the field of view exceeds a selected threshold, the 

locations. As illustrated, the output signal 44 communicates blob detection stage 56 analyzes the binary motion image 

with a broadly termed detection stage 50, which corresponds generated by the motion detection stage 54 to determine 

at least in part to the head find stage 28 of FIG. 1. The output 10 whether motion occurs within the field of view, for example, 

signal 44 also communicates with the compression stage 36, by sensing a change in pixel content over time. From this 

which is described in further detail below. Those of ordinary information, the blob detection stage 56 defines a region of 

skill will realize that the camera itself can digitize acquired interest (ROI) roughly corresponding to the head position of 

images, and hence the frame grabber stage 26 can be the person in the field of view. This ROI is truly a rough 

integrated directly into the camera. 15 approximation of the region corresponding to the head and 

FIG. 3 is a further schematic depiction of the detection practically is an area larger than the head of the person, 

stage 50 of FIG. 2. The video frame signal 44 is received by although it may also be a region of about the same size. The 

the detection stage 50. The signal comprises an N by N array blob detection stage employs known techniques to define 

of pixels, such as a 256x256 pixel array, which have selected and then correlate an object (e.g., the head of a person) in the 

chrominance and luminance values. The pixels are inputted 2 o image. The present invention realizes that the motion infor- 

into the detection stage 50, and preferably are analyzed first mation can be employed to roughly estimate the region of 

by the motion detection stage 54. The motion detection stage interest within the image that corresponds to the person's 

54 receives a number of input signals, as illustrated, such as head. In particular, the blob detection stage 56 designates a 

signals corresponding to frame width and height, frame bit "blob" corresponding roughly to the head or ROI of the 

counts and type, maximum number of frames, selected 25 person within the field of view. A blob is defined as a 

sampling pixel rate, motion threshold values, maximum and contiguous area of pixels having the same uniform property, 

minimum head size, and RGB index threshold values. One such as grey scale, luminance, chrominance, and so forth, 

or more of these additional input signals in combination with Hence, the human body can be modeled using a connected 

the frame input signal 44 trigger the motion detection stage set of blobs. Each blob has a spatial and color Gaussian 

to assess whether motion has occurred within the field of 30 distribution, and can have associated therewith a support 

view. In particular, the motion detection stage 54 is adapted map, which indicates which pixels are members of a par- 

to detect subtle changes in pixel values, such as luminance ticular blob. The ability to define blobs through hardware 

values, which represent motion, especially when an object (such as that associated with the blob detection stage 56) is 

moves against a relatively still background image (such as a well known in the art, although the blob detection stage 56 

kiosk, cubicle or hallway). One method of determining 35 can also be implemented in software. The system therefore 

motion is to perform a differencing function on selected clusters or blobs together pixels to create adjacent blobs, one 

pixels in successive frames, and then comparing changes in of which corresponds to a person's head, and hence is 

pixel values against a threshold value. If the pixel variations defined as the ROI. 

within the field of view exceed the threshold value, then an According to another practice and with further reference 
object is deemed to have moved within the image. 40 to FIG. 3, the color table 60 can be employed to further 
Conversely, if the changes are below the threshold, the re fine the ROI corresponding to the head. The word "refine" 
system determines that no suitable motion has occurred. is intended to mean the enhancement, increase or improve- 
According to another technique, a spatio -tempera! filter- ment in the clarity, definition and stability of the region of 
ing scheme can be applied to the captured image to detect interest, as well as a further refinement in the area defined as 
motion, as set forth in U.S. Pat. No. 5,164,992 of Turk et al., 45 the region corresponding to the person's head. For example, 
the contents of which are hereby incorporated by reference. as discussed above, the ROI established by the motion 
In this scheme, a sequence of image frames from the camera detection stage is a rough region, larger than the head, that 
40 pass through a spatio-temperal filtering module which defines a general area within which the head can be found, 
accentuates image locations which change with time. The Flesh tone colors can be employed to "lighten" or reduce the 
spatio-temperal filtering module identifies within the frame 50 ROI characterizing the person's head to better approximate 
the locations and motion by performing a differencing the area corresponding to the head. TYris process serves to 
operation on successive frames of the sequence of image overall refine the region of interest. The color table is 
frames. A typical output of a conventional spatio-temperal intended to be representative of any suitable data storage 
filter module have the moving object represented by pixel medium that is accessible by the system in a known manner, 
values having significantly higher luminance than areas of 55 such as RAM, ROM, EPROM, EEPROM, and the like, and 
non-motion, which can appear as black. is preferably a look-up table (LUT) that stores values 

Hie spatio-temperal filtered image then passes through a associated with flesh tone colors of a sample group, 

thresholding module which produces a binary motion image The present invention realizes that people of different 

identifying the locations of the image for which the motion races have similar flesh tones. These flesh tones when 

exceeds a threshold. Those of ordinary skill will recognize 60 analyzed in a three-dimensional color or RGB space are 

that the threshold can be adjusted to select a certain degree similarly distributed therein and hence lie essentially along 

of motion. Specifically, minor movements within the field of a similar vector. It is this realization that enables the system 

view can be compensated for by requiring heightened to store flesh tone colors in a manner that allows for the rapid 

degrees of motion within the field of view in order to trigger retrieval of color information. The flesh tone color values are 

the system. Hence, the thresholding module can be adjusted 65 created by sampling a reference set of people, e.g., 12-20 

to locate the areas of the image containing the most motion. people, and then creating a histogram or spatial distribution 

This filtering scheme is particularly advantageous in moni- representative of each of the three primary colors that 
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constitute flesh tone, e.g., red, blue and green, using the 
reference set of people as a basis in ST color space (H^). 
Alternatively, separate histograms for each color can be 
created. The color histogram is obtained by first reducing the 
24 bit color to 18 bit color, generating the color histogram, 
and then transforming or converting it into ST color space 
from the intensity profile in the RGB space. The system then 
obtains the non-face color histogram in ST color space (H„). 
This is obtained by assuming that non-face color is also 
uniformly distributed in the RGB space. The histogram is 
then converted into ST color space. The transformation into 
ST color space is performed according to the following two 
equations: 



T=(Ul-G-By<R+G+B) 



The color histograms are then normalized by converting 
H f and H„ to X* f and P„ according to Bayes Rule, which 
determines the face probability within the color space. 
Consequently, the normalized face can be represented as: 



(Eq- 3) 



It/n*>JV«Mx255 



(Eq.4) 



A certain portion of the resultant histogram(s) is then 
defined, for example, about 90% of the histogram or class 
width, for each of the colors in the histogram. This defines 
upper and lower limits of color values that are deemed 
acceptable by the system when determining whether the 
input pixel values of the frame 44 are representative of flesh 
tone. These histogram color distributions are then stored in 
the color table 60. 

The system 20 further includes a color adjustment stage 
62 that is employed to change or to adjust the flesh tone color 
values stored within the table. For example, if additional 
people are sampled, these color distribution values can be 
combined with the histogram values stored in the table. 

With reference to FIG. 4A, during face detection, the 
color table values 64 are introduced to a color reduction 
stage which reduces the color from 24 bit to 16 bit for ease 
of handling. This can be performed using known techniques. 
The detection stage 50 then further defines the ROI. The 
detection stage 50 ignores darker colors by setting to zero 
any pixel having a value less than 16. The system also 
includes a threshold stage 84 that compares the rough ROI 
with a threshold value to convert it to a binary image. An 
erosion stage 86 performs an erosion operation on the binary 
image to remove noise and disconnect hair pixels from face 
pixels. The erosion operation reduces the size of an object by 
eliminating area around the object edges, and eliminates 
foreground image details smaller than a structuring element. 
This increases the spacing between the face and the hair in 
the image. The erosion operation can be performed as 
follows: 



A<g)B-(AU if b:{x,y) then -b(-x,-y) 



beB 



Those of ordinary skill will realize that erosion is the 
intersection of all translations, where a translation is the 
subtraction of a structuring element set member from an 
object set member. The symbol (^) is used to signify the 
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(Eq. 1) 
03*2) 



The system then calculates the width and height of the 
table, as well as the values of the face probability look-up 
table 60 according to the following formula: 
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erosion of one set by another. In equation 5, A is the set 
representing the image (ROI), B is the set representing the 
structuring element, and b is a member of the structuring 
element set B. Additionally, the symbol (A)_ fc denotes the 
translation of A by -b. After the erosion operation is 
completed, the detection stage 50 performs the connected 
component blob analysis 56 on the ROI. 

After the blob analysis is performed on the image by the 
blob detection stage 56, the dilation stage 88 performs a 
dilation operation thereon to obtain the face regions within 
the ROI. The dilation operation is employed to expand or 
thicken the ROI, and is thus the inverse operation of erosion. 
Furthermore, the dilation operation is the union of all 
translations of the image by the structuring element 
members, and is defined as follows: 



(Eq- 6) 



beB The symbol (X) signifies the erosion of one set by 
another. In equation 6, A is the set representing the 
image, B is the set representing the structuring element, 
and b is a member of the structuring element set B. 
Additionally, the term (A) b represents the translation of 
A by b. According to one practice, the set B can be 
defined as including the following coordinates {(0, 0), 
(0, 1), (1, 0), (1, 1)}. The output of the dilation stage is 
the ROI. The system can further process the image data 
by defining the largest area as the dominant face region, 
and merge other smaller face regions into the dominant 
face region. The center of the ROI is then determined 
by placing a 128x128 pixel box on the ROI (e.g., face) 
by setting its center as: 
X center=X (mean of dominant face region) 

Y center=top of the face region+average sampled^ 

face__height/4 

The foregoing detection stage 50 hence compares the 
rough ROI with the contents of the color table 60, performs 
selected erosion and dilation operations to obtain the pixels 
associated with the face (by analyzing chrominance values), 
40 and ultimately refines the ROI based on the contents of the 
color table 60. The entire operation is illustratively shown as 
a logic operation in FIG. 4B. Specifically, the detection stage 
50 inputs data associated with the blob or rough head ROI 
66 generated by the blob detection stage 56 to one input 
45 terminal of an AND gate 70. The color table 60 is coupled 
by communication pathway 64 to the other input of the AND 
gate 70. The illustrated gate 70 performs a logic operation on 
the inputs and generates an output image that corresponds to 
the overlap of identical data values at the input. This 
50 operation serves to refine the rough ROI. The rough ROI is 
tightened or made smaller than, or maintained approxi- 
mately the same size as the rough ROI, since the flesh tone 
colors that exist in the ROI and which match the stored color 
values in the table 60 are retained, while colors in the ROI 
that are not stored in the table 70 are discarded. Hence, the 
ROI is processed to produce a refined ROI 74 that more 
closely resembles the person's head. Those of ordinary skill 
will realize that the foregoing logic operation is merely 
exemplary of the refinement feature of the invention, and 
can be implemented in software as well as hardware. 

A significant advantage of employing the motion detec- 
tion stage 54 and the color table 60 in defining the ROI 
corresponding to the head is that these features can be 
performed in real-time, since there is generally no process- 
65 ing and hence time cost associated with employing the 
motion detection and color features of the detection stage 50. 
Specifically, the motion detection stage 54 determines 
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motion within the field of view prior to the system actually 
needing to utilize the acquired image information. For 
example, a person initially entering the field of view in a 
secured area generally does not require immediate access to 
the secured facility. In the meantime, the system 50 detects 
motion, blobs together pixels that roughly correspond to the 
person's head, and then refines this ROI using pre-stored 
flesh tone colors according to the above techniques. This is 
performed in real-time, with minimal processing cost and 
inconvenience to the person. Additionally, refining the ROI 
allows the system to more quickly and accurately locate an 
object, such as the eyes, within the ROI, since the ROI has 
been closely tailored to the actual size of the hand of the 
person. 

With reference to FIGS. 3 and 5, the detection stage 50 
can also define the head ROI when the system first detects 
motion followed by subsequent frames where no motion is 
detected, that is, when the object or person within the field 
of view is immobile, or the acquired image data is static. 
This may occur when a person originally enters the field of 
view and then immediately stops moving. The illustrated 
detection stage 50 includes an eigenhead generation stage 76 
that generates eigenvectors that correspond to a head using 
PCA theory and techniques. Specifically, the eigenhead 
stage 76 initially samples a reference set of individuals and 
performs a PCA operation thereupon to generate a series of 
eigenheads that define the distribution of heads within a 
multi-dimensional image space. The eigenheads employed 
by the present invention are preferably low resolution 
eigenheads, such as between about 17x17 pixel and about 
64x64 pixel resolution, and preferably about 21x21 pixel 
resolution, since a rough size match rather than intricate 
feature matching is all that is required to quickly define the 
ROI. An advantage of employing low resolution eigenheads 
is that they are relatively fast to process. 

The eigenheads generated by the eigenhead stage 76 are 
further scaled to various sizes, illustrated as head sizes 
78A-78D, to enable a complete and accurate correlation 
match. Specifically, the ROI is searched usiog an eigenhead 
(e.g., with eigenhead 78 A) of a particular size as a window- 
ing function, and the system determines if there is a suffi- 
ciently high correlation match. If no match is found, then the 
eigenhead is scaled downward, for example, to eigenhead 
size 78B, and again the motion ROI is searched with this 
eigenhead template. This process is repeated until a match is 
found. If none is found, then the eigenhead templates are 
scaled upwards in size. Hence, the detection stage 50 
employs a multi-scale correlation technique to identify a 
ROI corresponding to a person's head by searching the ROI 
with a variable-sized eigenhead template to determine if 
there is a correlation match. 

FIG. 6 is a more detailed schematic representation of the 
primary eye find stage 30 of FIG. 1. As described above, the 
output of the detection stage 50 is a series or list of ROls 
corresponding to a person's head (head ROI). The ROI is 
passed through a head center and scaling stage 110 that 
centers and scales the ROI for subsequent use. Specifically, 
the center and scaling stage 110 determines the coordinates 
of the center of the region of interest. The head center 
coordinates can be determined by calculating the mean value 
of the contours of the ROI. The size of the head ROI is 
estimated as the mean distance from the head center to the 
contour edges of the ROI. This information is useful for 
determining the approximate location of the eyes within the 
ROI, since the eyes are generally located within a rough 
geometrical area of the overall head ROI. 

The output signal 112 generated by the center and scaling 
stage is communicated to a first eye find stage 120 which 



comprises part of the overall identification system 20 and 
specifically the primary eye find stage 30. The first eye find 
stage 120 is adapted to receive a number of input signals 
carrying a variety of different image data or information. In 
particular, the frame data signal 44 generated by the frame 
grabber 26 is received by the first eye find stage 120. 
Additionally, an eigeneye template module 130 generates 
and stores a number of eigenfeature or eigeneye templates 
corresponding to a reference set of images. The eigeneye 
templates can be constructed in known fashion, the general 
construction of which is described in further detail below. 
The eigen template module generates an output signal that is 
also received by the first eye find stage 120. 

Additionally, the eigeneye template module 130 and pref- 
erably the first eye find stage 120 employs a selected 
weighting profile, or windowing function, when correlating 
the ROI with the eigeneye templates. In particular, the 
system 20 employs a center-weighted windowing function 
that weights image data more strongly in the middle portion 
of the image while conversely weighting data less strongly 
towards the outer regions of the image. FIGS. 7 A through 7C 
illustrate exemplary weighting profiles 202, 206, 208 
employed by the eye find stage 30 of the invention. FIG. 7A 
graphically illustrates one such weighting profile, and 
defines image data width along the abscissa, and normalized 
data weight along the ordinate. The illustrated weighting 
profile 200 has a sinusoidal -shape and is employed by the 
present invention as a window function. The function 
weights image data in a central region 202 of the window 
more strongly than image data at the edges of the image. 
Hence, the system accords the most weight to image data 
that has the highest percentage chance of being incorporated 
into the eigen template during production of the same. 
Conversely, the weighting profile accords less significance, 
and preferably little or no significance, to image data located 
at the boundary regions of the image. This center- weighting 
window function ensures that the system maximizes the 
incorporation of essential image data into the correlation, 
while consistently minimizing the chance that unwanted 
40 extraneous information is employed by the system. 

The system 20 places the window function over a selected 
portion of the ROI, and then analyzes the ROI using this 
window function. The window function shape thus defines 
that selected portion of the image to be analyzed by the 
system of the invention. The illustrated sinusoidal-shape of 
the window function 200 thus weights more heavily data 
corresponding to the portion of the ROI that overlaps with 
the center portion of the function relative to the outer 
boundaries of the function. The use of a center-weighted 
window function enables the system 20 to avoid incorpo- 
rating unwanted image data into the eigen template. The 
image data may be accidentally corrupted when employing 
conventional window functions by including unwanted data 
associated with adjacent facial features, shading, and other 
illumination perturbations. The system avoids incorporating 
this unwanted data into the eigentemplates, thereby mini- 
mizing the likelihood of the system generating false 
matches. 

The significance of the window function shape employed 
by the identification system 20 of the present invention can 
be further illustrated by a simple example. For instance, 
eigenfaces can be created from a reference set of images in 
accord with PCA principles described in greater detail 
below. One or more features of the acquired facial images 
can be utilized to form selected eigentemplates of a particu- 
lar facial feature. In one example, eigenvectors correspond- 
ing to eyes, and thus called eigeneyes, can be created from 
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the reference images forming part of the reference set. equivalent to correlating the new image with each image in 

Variations among eyes are prevalent in the reference set of the reference set to determine the closest match. This 

images because of the various people that constitute the correlation can be performed using the traditional Eigen 

reference set. Additional factors, however, influence the approach, or can be performed by calculating the eigen 

random variations of the reference images. For example, if 5 coefficients using a fast fourier transform (FFT) approach to 

a particular individual's image was captured while the generate a correlation map. According to a preferred 

person was wearing eyeglasses, the system may inadvert- practice, the system 20 employs the FFT approaching the 

ently include data associated with the eyeglass frame and eve 6x1(1 sta S e 20 > and specifically to any selected input to the 

other glassware components when generating the eigenface. head find sta S e 28 or tne e y e find sta S e 158 to perform the 

If a standard weighting profile where image data is valued 10 correlation between the newly acquired image and one or 

equally thereacross were employed to analyze data corre- m0Te reference images. 

sponding to areas surrounding each eye, the eye portion of 0ne exam P le °f employing this FFT approach is as 

the image may include information corresponding to the follows. The input image is initially acquired and digitized, 

eyeglass frame. As is obvious to one of ordinary skill, this md tnen processed by the detection stage 50. Having 

additional information corrupts the overall acquired image 15 ca Ptured a static image of interest by the techniques and 

data, and when projected onto the image space, may actually methods previously and hereinafter described, the image 

distort the spatial location of me eye within this image space. ( e -g-> frame data and /° r eigeneyes) is reduced to a digital 

Specifically, the eye may be spatially shifted right or left, representation of pixel values. These pixel values corre- 

thus destroying the true spacing between eyes as well as the s P ond t0 the measure of the light intensity throughout the 

particular orientation of the eye relative to other facial 20 image. As an example, an image may be digitized to form a 

features. Since this information is utilized by the system to rectangular or square array of pixel values which are indica- 

generate templates, which themselves are employed to iden- dve of me h g ht ^tensity within the image. For example, a 

tify matches with a newly acquired image, the system could facial ™ n be reduced to N rows by M columns of 

be prone to false matches. P Kel data resulting in an aggregate of NxM pixel values. 

FIGS. 7B and 7C illustrate yet other examples of weight- 25 Eacn of P ixel values can be identified as to location by 

ing profile shapes that can also be employed by the eye find row column. Consequently, it is natural to represent the 

stage 30 of the present invention. In particular, FIG. 7B digitized image as a discrete function of luminescence or 

illustrates a bell-curve type weighting profile 206 that also ^ intensity that varies by pixel location. Such a function 

accords stronger weight to a middle portion of the image as * represented as I(x ( , yj) where x, designates a row of pixel 

opposed to the peripheral or boundary regions. Likewise, the 30 locations and y y identifies a column of pixel locations, thus 

step function 208 further accords, in a stepwise fashion, identifying an individual pixel within the image, 

more weight to image located within the interior regions of In certam ima S c processing applications, it is desirous or 

the image as opposed to the outer regions. Those of ordinary necessary to identify or recognize a distinctive object (ROI) 

skill will readily recognize that other possible window or feature wthin the larger image. For example, in a security 

shapes can be employed by the system 20 without departing 35 application, it may be necessary to identify an individual's 

from the spirit and scope of the invention. face ^ a lar § er reference set of faces of individuals 

An advantage of employing the eigeneye templates in the authorized to access a secured location. Conventionally, this 
eye find stage 120 is that PCA projections in image sub- has been accomplished by storing a digital representation of 
space require little or no processing time, and thus are thc face of cach authorized individual in a vector or matrix 
simple and efficient to use in facial reconstruction systems. 40 representation. The digitized facial image of the person 
Since the Eigenface method is based on linearly projecting requesting access to the secured resource is then matched 
an image onto multi-dimension image space, this method against the set of reference faces authorized for access to the 
yields projection directions that maximize the total scatter resource in order to determine if there is a match. The 
across all the facial images of the reference set. The pro- matching process has conventionally been performed by a 
jections thus retain unwanted variations due to lighting and 45 mathematical correlation of the digital pixel values repre- 
facial expression. This scatter can be greater than the con- senUn S me face of the individual requesting access with the 
ventional scatter that is produced in the projections due to P" cl values of me faces from me reference set. In math- 
variations in face identity. One method to overcome this ematical terms the correlation is represented by the value 
scatter is to include in the reference set a number of different 

images that mimic the continuum of lighting conditions in 50 v V ^ 7 ) 

order to more evenly distribute points in the image space. '(*.-. ?f )'*(*■•. y>) 

These additional images, however, could be costly to obtain ~ ^ 
and require significant intrusions on the reference people. 

Furthermore, analyzing and manipulating this additional where l(x i7 yj) is the luminescence value for the facial image 

data becomes significantly cumbersome and computation- 55 to be detected at each of the pixel values and l^x^ yj) is the 

ally burdensome. One technique to address the scatter in the corresponding facial image from the reference set. The 

eigenimages is to correct for the variations in lighting and correlation is performed for each image from the reference 

expression during the image manipulation stage 34 or during set. It is well known that a good match of digital data is 

any other convenient stage of the illustrated facial recogni- represented by a large correlation value, and thus the refer- 

tion system 20. 60 ence image with the greatest correlation is considered the 

Those of ordinary skill will recognize that a correlation in best match to the image to be detected. A predetermined 

the Eigen approach is a nearest neighbor classifier scheme in thresholding value is set so as to ensure that the match is 

image space. For example, a new image (e.g., the ROI) can sufficiently close. If all the calculated coeflicient values are 

be classified (recognized) by assigning to it the label of the below the threshold value, it is presumed that the detected 

closest point in the reference set, as measured in the image 65 face or feature is not found in the matching reference set. 

space. Since all of the images are normalized to have zero Since the object or feature to be identified may comprise 

mean and unit variance, classifying the nearest match is only a subset of the larger image, the images from the 
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reference set must be correlated over all possible subsets of value of the image to be matched are adjusted according to 
the image in order to detect the object or feature within the the following rule: 



larger image. Using the previous security example, the face 
to be identified or detected may exist within a background of 



/ifrj'Mfcl'M OEq. 12) 



W=(JV*)-%r„ (Eq.8) 30 



unrelated objects, and also positioned at almost any location 5 where c and b are the respective standard deviation and 

within the larger image. Thus, the reference faces are mean from the image in the reference set, and Ifo-^) are the 

correlated with all possible subsets of the image to find and original pixel values in the image to be matched. 

to identify the face to be matched. According to a further practice, a windowing function is 

While the techniques described above can be used to defined that weights the product of the corresponding lumi- 

calculate a correlation value, they are computationally slow 10 nescence values according to their significance in detecting 

and processor intensive. For example, if an image of 320 the ob J ecl m ima S e ' For example, if one were 

pixelsx640 pixels is to be compared against a set of refer- attempting to find an eye within a facial image, a windowing 

ence images, at least 204,800 multiplications and additions function can be defined to emphasize the correlation of 

must be performed for each referenced image to calculate certain aspects of the reference eye and to avoid the confu- 

the correlation values for that image. The magnitude of this 15 sion associated with peripheral features such as eyeglasses. 

computing requirement severely restricts the number of In one embodiment of the invention, the windowing func- 

reference images in the reference set. Hence the system is ^ on has a shape corresponding to the previously-described 

severely limited in the number of images it can store in the center-weighted windowing function that accords greater 

reference set, weight or significance to pixel values in the center of the 

T, _ , . . c . u 20 windowing map and lesser or no significance to those on the 

The methods and techniques of the current invention are « - X. \. • „ . _„ , ' ' . 

advantageously employed using the concept of an eigenface edge / oi . the map ' aS s h hown f ™ S 7B ™ d 7C " 

basis to reduce this computational requirement. The face to ma P ma y «* enytoyed with a two dimensional 

be detected from a training or reference set of facial images cac f K geometry. Pixel values outside the bounds of the 

can be defined by a mathematical relationship expressed as „ m fP have \ w f °f »», and thus do not enter 

I(x,, yi ). Let the training or reference set of acquired face m * ^ =°™l^on calodation. 

images be represented by r„ T 2 , r, . . . r^. The average ^ g P eaBc <? d f tect ^ "» "^vidoal face withm 

face of this reference set is defined by 1 lal B 6 / »»* e ^escribed mathematically using the above 

described eigenface concept. The foregoing discussion 
while focused on identifying an individual's face within an 
image, can also be used in a more general sense to identify 

where the summation is from n-1 to M. Each reference face ' he , head ^individual, the eyes of an individual, or any 

differs from the average or mean face by a vector O.-r.-W. f tmctive feahire within an miage Tlie set of basis eigen- 

Ttas, the mean is found by adding all the faces together in faces K S1 ? P ? l ° . ^ basls or 

the reference set and then dividing by the numbefof face 35 In <he following discussion x and y are consid- 

• ^ . . I i „ i j f n11 tUA f _ ered vectors which in component form would be written as 

images. The mean is then subtracted from all the face , * , ^ r 

images. A matrix is subsequently formed from the resultant ix*>v an y i> v J- \ t . 

mean adjusted faces. t ^ 20 imtian y defines t0 be a 

_ . * , function which is centered at x=0 and has unit power, 

This set of very large vectors associated with the reference 

faces is then subject to principal component analysis (PCA). 40 

The PCA establishes a set of M orthonormal vectors, fi K , VV^-i ^1- 13 ) 

which best describe the distribution of face data within the ±\ & 
face space. The kth vector, /* K , is chosen such that: 

KK^f)' 1 ^ ) 2 (Eq 9) 45 ^ bc tne * ma S c to Dc analyzed, where I(x) is moved 

under the window function to analyze it. The effect of 

is a maximum, subject to: brightness and contrast variations in the part of the image 

under the window is to be minimized by scaling I(x) by a 

(j if J _ ^ factor c, the standard deviation of the pixel values in the 

0 otherwise ^ ^ 50 re f ereQce image undergoing analysis, and an additive con- 
stant b which is the mean of the pixel values in that reference 
image. Thus the family of images that result from contrast 

The vectors /i K and scalars ^ are the eigenvectors and and brightness changes to image I(x) can be modeled as 

eigenvalues, respectively, of a rather large co variance matrix cl(x)+b which is expressed as I f (x). 

55 To counter contrast and brightness variation, when I^x) is 

c=(hf)-%0 o r (Eq. li) shifted by an offset y, to cause w(x) to overlay different 

portions of the image, l s (x) maps to a new function p(x, y) 

It has been recognized that the contrast and brightness of ^ ^ 2X10 mean ^ P owcn ^ 
each of the images in the reference set {T,} may differ 

significantly from each other and from the image to be 60 y y h^-o and 14 > 

matched. These differences may skew the matching results, ^j^ P ~ 
and thus create errors in detection. The present invention 

compensates for these differences. Specifically, the image to A ^ 22 (Eq ' 15J 

be matched is adjusted relative to each image from the 2j Zj p ~ 1 

reference set before correlation is performed. The statistical 65 * 
mean and standard deviation of all the pixel values for the 

individual reference image are determined, and the pixel These conditions require that 
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where 
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(Eq. 16) 



(Eq. 17) 



(Eq. 18) 
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Note that for any value of c and b, c I(x)+b map to the same 
function p(x,y). 

The function p(x,y) can be described in terms of its 
coefficients with respect to a set of eigen basis functions 
/^(x). These coefficients, which are designated as Q K (Y), are 
defined as the inner products. The basis functions are 
computed from the set of reference images T, that were 
properly aligned so that the feature of interest (e.g., the face 
to be identified) is centered at the zero point in every 
reference image, and the eigenfaces previously described are 
represented as: 
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; ZZ p(l -^ U)w2u) 
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For convenience, we will also stipulate that 



1=1 /=1 



This gives 



20 



25 



(Eq. 19) 



35 



(Eq. 20) 



45 



50 



The weights Q K form a vector Q r =[Q 1 Q 2 . . . Q m ] describ- 
ing the contribution of each eigenface in representing the 55 
new input face image, thus treating the eigenfaces as a basis 
for the face images. 

The foregoing vector can then be used in a standard 
pattern recognition algorithm to determine which of the 
faces from the reference set, if any, best matches the 60 
unknown face. The simplest method for determining which 
face class provides the best description of an input face 
image is to find the face that has a representation in terms of 
the eigenface basis vectors with a minimum Euclidean 
distance between the coefficients, e K -||Q-QK)|| 2 . 65 

FIG. 8 is a schematic block diagram depiction of the eye 
find stage 120 which can employ, among other things, the 
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Discrete Fast Fourier Transform (DFFvQ approach described 
above. Specifically, the eye find stage 120, for example, can 
employ DFFT procedures to correlate the ROI with the eigen 
template, such as the eigeneye templates, to produce a 
correlation map. It has been realized that the expressions for 
the correlation may be calculated in a more efficient fashion 
using an DFFT approach. Specifically, the expressions may 
be computed by transforming the calculation to the fre- 
quency domain, and then performing an inverse transform 
operation to obtain the result in the spatial domain. It has 
been realized that the sum of products in the space domain 
is equivalent to the product of the DFFT in the frequency 
domain. An inverse DFFT transform of this product then 
produces the required result. By transforming the computa- 
tion into the frequency domain, the inherent efficiency of the 
DFFT can be utilized to significantly reduce the overall 
number of calculations required to obtain the results. 

According to one practice, the eye find stage 120 receives 
a template 112A from the eigeneye template stage 130. The 
eye find stage 120 employs one or more transform stages 
210A and 210B to convert the eigen templates and the ROI 
signal 112 into the frequency domain to reduce the amount 
of computations necessary to produce the correlation map 
214. The DFFT stages 210A, 210B reduce the amount of 
computations since rather than constructing a map by sum- 
ming the products of the templates and the ROI, the eye find 
stage 120 of the invention merely acquires the dot product, 
in the frequency domain, of the input signals by transform- 
ing the image and template into the frequency domain. The 
converted data is then multiplied by the multiplier 213 to 
perform the foregoing dot product. The eye find stage 120 
then reconverts the data into the spatial domain be employ- 
ing the inverse transform stage 212. The stage 120 hence 
generates a correlation map identical to that generated 
employing the conventional spatial technique, without 
manipulating large, complex equations. Hence, the system is 
faster, more responsive to input image data, and is capable 
of generating correlation maps in real-time. 

FIG. 9 is a schematic flowchart diagram illustrating the 
operations employed to identify an individual face within a 
larger image. The system first generates a digitized image 
consisting of a face that is to be matched to a particular face 
in a reference set of stored faces, as set forth in steps 305 and 
310. Each face within the reference set of faces is then 
normalized and converted into the frequency domain using 
the DFFT 210A, and a set of basis vectors (e.g., eigenfaces 
or eigeneyes), ^ K , that span the set of known reference faces 
is obtained employing conventional Eigen techniques. This 
is set forth in steps 315 and 320. 

According to step 325, the system then obtains the com- 
ponent coefficients Q K in terms of the basis vectors ^ for 
each face within the reference set of faces by employing a 
dot product operation. This can be performed in the eye find 
stage 120. As illustrated, the stage 120 receives the centered 
and scaled ROI and an eigen eye template from the template 
stage 130. The eye find stage can employ a program or 
hardwired system that converts the eigeneye data into vector 
coefficients in the frequency domain. The resultant operation 
forms a vector £2=(Q lr of component coefficients for 

each face in the reference set. 

The system then normalizes the unknown facial image, as 
set forth in step 330, for contrast and brightness for each 
reference image, and converts the normalized image data 
into the frequency domain using DFFT210B. The system 
then defines a windowing function of the type described 
above (e.g., center-weighted function) to emphasize selected 
local features or portions of the image. This is set forth in 
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step 335. The system then overlays the image on the find stage 156, which receives signals similar to the first eye 
windowing function, and calculates a set of component find stage 120, to again attempt to locate the eyes. If the 
coefficients Q for the unknown image in terms of the system fails the second time to determine the eye locations, 
eigenfaces ^ using a dot product operation, step 340. the system produces an output signal 158 which actuates the 
Finally, as set forth in step 345, the system compares the 5 frame grabber 26 to reacquire an image. The redundant head 
component coefficients Q of each face from the reference set and eye find stages 146 and 156 increase the eye location 
with the coefficients of the unknown image to determine if accuracy of the system. Those of ordinary skill will recog- 
a match exists. nize that there is a tradeoff between accuracy and time when 
The illustrated system 20 thus provides an integrated determining whether a newly acquired image matches a 
real-time method of detecting an individual face within an 10 pre-stored image. The illustrated system 20 attempts to 
image from a known reference set of faces by converting the balance these competing concerns by opting for the fast, 
template and ROI data into the frequency domain, obtaining real-time initial approach of locating the eyes with the first 
the dot product, and then reconverting the data into the eye-find stage 120. If this fails, however, the system employs 
spatial domain to develop a correlation map. One of ordinary the head find and eye find stages 146 and 156 in order to 
skill in the art will readily recognize that while the method 15 improve the overall accuracy of the system, 
and techniques employed are described in terms of a face The operation of the primary eye find stage 30 of FIGS, 
detection application, the advantages and benefits of the 1 and 6 is further illustrated in the flow chart schematic 
invention are not limited to this application. In general, the diagrams of FIGS. 10 and 10A In particular, the head ROIs 
invention can be used to advantage in any application with produced by the detection stage 50 of FIG. 3 serve as the 
the need to identify or detect an object or feature within a 20 input to the primary eye find stage 30 of FIG. 6. The system 
digitized image, such as a head or eyes of an individual. then determines if the number of ROIs within the image are 
Moreover, in the most general application of the invention, greater than zero. This is determined in step 220. If the 
a known data structure or pattern of digital data from a number is greater than zero, the system sets a motion ROI 
reference set of such data structures can be identified within counter to zero, as set forth in step 224, and then proceeds 
a larger set of digitized values. 25 to further process the ROI. Conversely, if the system deter- 
In an alternate embodiment, the system can also input data mines that the number of head ROIs is not greater than zero, 
associated with eye clusters generated by the eye cluster then the system determines whether the last ROI is devoid 
stage 140. The Eye cluster stage 140 logically organizes a of appropriate image data, as set forth in step 222. If the 
reference set of eye images into clusters in order to develop image is devoid of image data, then the system actuates the 
templates that used by the eye find stage 120 to locate the 30 image acquisition device 22 and frame grabber 26 to reac- 
eyes. Specifically, as described above, the eye find stage 120 quire an image. If not, then the system proceeds to step 226, 
compares the centered ROI with the eye cluster template to as set forth below. 

determine the existence of a match. Those of ordinary skill After the system 20 determines that the last motion ROI 

will readily understand the use of eye clusters, and in contains data by setting the motion ROI counter to zero, the 

accordance with the teachings of the present invention, how 35 system calculates the head center within the image, as set 

they are implemented by the present system to locate a forth in step 226. The system then proceeds to calculate the 

region in the ROI. appropriate eye scale step 228, and then locates the eyes in 

Referring again to FIG. 6, the eye find stage 120 receives the region of interest ROI, step 230, As set forth in step 232, 

the original image frame data 44 and the ROI that has been if the system determines that the eyes in the ROI were 

scaled and centered by the scaling stage 110, and performs 40 located, then an eye error counter and the last motion ROI 

a correlation match with the eigen eye templates and win- counter are set to zero, thus signifying that an accurate eye 

dowing function to determine the eye locations within the location operation has occurred. This is set forth in step 234. 

image. As set forth above, this correlation can be performed The system then passes the eye location image information 

in the spatial or frequency domain. If the eye find stage 120 onto the compression stage 36. 

produces a sufficiently high correlation, and thus locates the 45 If the eyes were not successfully located, the system, as 

eyes within the image, the stage generates an output signal set forth in step 236, increments the eye error counter to 

122 that is indicative of eye locations, and which is received signify that an error has occurred while attempting to 

by the compression stage 36. identify or locate the eyes within the head ROI. The system 

When the first eye find stage 120 is unable to determine 20 then reverts to a backup head find stage 146 and second 
the eye location, the system 20 reverts to a backup technique so eye find stage 156 to locate the eyes. In particular, the system 
that employs the second head find stage 146 and the second once again locates the head in the region of interest, as set 
or back-up eye find stage 156. In particular, the first eye find forth in step 246. This particular step is in feedback corn- 
stage 120 generates an output signal 121 that serves to munication with two particular feedback loops 242 and 245. 
actuate the frame grabber 26 to re-acquire an image, while As illustrated in FIG. 8, the system calculates the spatial 
concomitantly generating an input signal for the head find 55 Cartesian coordinates of the ROI, as set forth in step 242. 
stage 146. This step occurs after the motion counter has been set to zero 

Similar to the first eye find stage 120, the second head find in step 224. Additionally, the system calculates the head 

stage 146 receives the original frame data 44, the eye find center coordinates, step 244, and which occurs after step 

stage output signal 121, as well as eigenhead templates 226. After the system locates the head for the second time in 

stored in the eigenhead template stage 150. The eigenhead 60 the ROI, as set forth in step 246, the system then attempts to 

templates are generally low resolution eigenheads produced locate the eyes. If the eyes are located this time, the system 

by the foregoing Eigen technique. The second head find 20 proceeds to set the eye error counter and the last ROI 

stage 146 performs a correlation match employing the counter to zero, as set forth in step 252 (similar to step 234). 

eigenhead templates stored in the eigenhead stage 150, and The eye location information is then transferred to the 

which correspond to the previously captured region of 65 compression stage 36. 

interest. Assuming there is a match at this stage, the system If the system again fails to locate the eyes, the error 

30 produces an output signal which actuates a second eye counter is once again incremented, as set forth in step 260, 
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to signify that an additional eye location failure has example, 256 by 256 pixels becomes a vector within this 

occurred. The system then proceeds to set the last ROI in the multi-dimensional space of 65,536, or equivalently, a point 

list to a value equal to the last motion ROI, as set forth in in a 65,536-dimensional image space. A series of acquired 

step 262. Once the counter is set to a value corresponding to images can thus be mapped to a series of points within this 

the last ROI, the system resets itself to accommodate addi- 5 rather vast image space. 

tional ROI information generated by the detection stage 50. The creation of eigenfaces turns on the realization that 
The system then repeats the entire process. different facial images are nonetheless similar in overall 
With further reference to FIG. 6, the eye location image configuration, and are not randomly distributed in the fore- 
data 122 is then transferred to the compression stage 36. going image space. The images are thus located within a 
Those of ordinary skill will recognize that prior to receipt of 10 rather small region of this vast image space, or in a relatively 
the eye location information by the compression stage 36, low dimensional subspace. Using principal component 
the information passes through an image manipulation stage analysis, one can identify the vectors which best account for 
34, as set forth in FIGS. 1 and 10. The eye location the distribution of face images within the entire image space, 
information can initially pass through a rotation stage 124 These vectors, coined "eigenfaces", define the overall "face 
which seeks to rotate the image information to a selected 15 space" of this system. As previously set forth, each vector 
orientation to enable an accurate and appropriate compari- having a length describes an N by N image, and can be 
son with presto red images. The rotated image data is then represented by a linear combination or concatenation of 
scaled by the scaling stage 126 to an appropriate size, and vector values of the original face images that constitute the 
then normalized by the normalization stage 128 to attain a reference set of images. 

normalized image suitable for processing by the compres- 20 A portion of the mathematics associated with the creation 

sion stage. Image data information not associated with the of eigenfaces was previously described in Equations 8 

eyes is then masked, or removed, by the masking stage 132. through 11 . 

The rotation stage 124, scaling stage 126, normalization It is known that C»(M) -1 2„^> n <3>„ r ,=AA r , where the 

stage 128, and masking stage 132, all employ conventional matrix A^O-^. • • The matrix C, however, is N 2 by 

processes that are readily apparent to one of ordinary skill in 25 N 2 , and determining the N 2 eigenvectors and eigenvalues 

the art. can become an intractable task for typical image sizes. 

The eye location information is then transferred to a Consequently, if the number of data points in the face space 
compression stage 36 where an Eigen procedure is per- is less than the dimension of the overall image space, 
formed on the data. This procedure is performed, in one namely, if M<N 2 , there are only M-l, rather than N 2 , 
embodiment, by first obtaining a training reference set of 30 meaningful eigenvectors. Those of ordinary skill will rec- 
faces by acquiring a number of reference images. The ognize that the remaining eigenvectors have associated 
training or reference set is normalized, as described above, eigenvalues of zero. One can solve for the N 2 dimensional 
so that all faces are the same scale, position, orientation, eigenvectors in this case by first solving for the eigenvectors 
mean, and variance. The actual encoding or compression of an M by M matrix, which is much smaller than the 16,384 
process can employ a Karhunen-Loeve transformation or an 35 by 16,384 matrix, and then taking appropriate linear corn- 
eigenvector projection technique, which encodes an image binations of the face images <& ( -. 
of a person's face or other facial feature, such as nose, eyes, Consider the eigenvectors v ( - of A r A such that: 
lips, and so forth, as a weighted set of eigenvectors. This a t Av = v (Ea 
eigenvector projection technique is described more fully in V *"W. m- 
U.S. Pat. 5,164,992, entitled "Face Recognition System", 40 Premultiplying both sides by A, yields: 
issued to Turk et al, the contents of which are hereby AA r Av^/iv l (Eq 22) 
incorporated by reference. As described therein, an image of ' 

a face is projected onto a face space defined by a set of fr° m which it is apparent that Av,. are the eigenvectors of 

reference eigenvectors. The reference set of eigenvectors, or C=AA . 

eigenfaces, can be thought of as a set of features which 45 Following this analysis, it is possible to construct the M 

together characterize the variation between face images by M matrix L-A r A, where l^^&J®^ and find the M 

within a reference set of facial images. This distribution of eigenvectors, v 1? of L. 

faces in the reference set of faces can be characterized by These vectors determine linear combinations of the M 

using principal component analysis to extract face informa- training set face images to form the eigenfaces ^: 

tion that characterizes the variations or differences between 50 

a newly acquired image (the projected image) and the X (Eq. 23) 

eigenfaces. Principal component analysis (PCA) is a known Ml " Zj 1=1 M 

technique. The resulting eigenvectors produced by perform- 
ing the PCA define the variation between the face images 

within the reference set of faces, and can be referred to as 55 The foregoing analysis greatly reduces the calculations 
eigenfaces. Thus, an eigenface is formed by multiplying necessary to handle the image data, from the order of the 
each face in the training set by the corresponding coefficient number of pixels in the images (N 2 ) to the order of the 
in the eigenvector. Once the eigenfaces are identified an number of images in the training set (M). In practice, the 
image signal can be represented as a function of these training set of face images can be relatively small (M«N2), 
eigenfaces by projecting the image signal into the space 60 although larger sets are also useful, and the calculations 
defined by these eigenfaces. become quite manageable. The associated eigenvalues pro- 
The foregoing is a result of initially characterizing each vide a basis for ranking or ordering the eigenvectors accord- 
face image I(x,y) as a two-dimensional image having an N ing to their usefulness in characterizing the variation among 
by N array of intensity values (8-bit). When employed to the images, or as a function of their similarity to an acquired 
produce eigenvectors, the face image can be represented in 65 image. Hence, the eigenvectors embody the maximum vari- 
a mulu-dimensional image space as a vector (or point) of ance between images and successive eigenvectors have 
dimension N 2 . Thus, a typical acquired image of, for monotonically decreasing variance. 
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In practice, a smaller number of images M', or a subset of acquired face can be verified by performing a simple thresh- 

the images M, is sufficient for identification purposes, since old analysis, that is, if the Euclidean distance is below some 

complete and accurate reconstruction of the image is gen- pre -determined threshold then there is a match, and the 

erally unnecessary to create a match. Framed as such, person, for example, can gain access to a secured facility, 

identification becomes essentially a pattern recognition task, s Because creating the foregoing vector Q T of weights is 

Specifically, the eigenfaces span an M'-dimensional sub- equivalent to projecting the original face image onto the 

space of the original N 2 image space. The M' most signifi- low-dimensional face space, many images project onto a 

cant eigenvectors of the L matrix are selected as those with given pattern vector, pis is generally acceptable since the 

the largest associated eigenvalues, and therefore contain the Euclidean distance e between the image and the face space 

most useful image information, e.g., contain maximum 10 I s ^ ^ betwc f cn ^ ^an-adjusted 

variance information. m P Ut tma L ge and ?/= 2 -^> f projecUon onto face 



space (where the summation is over k from 1 to M'): 



variance information. 

A newly acquired face is represented by a weighted series 

of eigenvectors formed from the most significant eigenvec- t a HI(*-*>)ll 2 . 

tors of the image sub-space. It is important to note that this Thus, there are four possibilities for an input image and its 

recognition technique assumes that the image, which is not is pattern vector: (1) near face space and near a face class; (2) 

part of the original reference set of images, is sufficiently near face space but not near a known face class; (3) distant 

"similar" to those in the training set to enable it to be well from face space and near a face class; and (4) distant from 

represented by the eigenfaces. Hence, a new face image (T) face space and not near a known face class, 

is transformed into its eigenface components (i.e., projected In the first case, an individual is recognized and identified, 

into the face space) by a simple operation, namely, 20 In the second case, an unknown individual is present. The 

last two cases indicate that the image is not a face image. 

t»K=/*K (r-v) (Eq. 24) £ ase mree typically shows up as a false positive in most 

<• 1 * -K*i -m.- j u oihcr recognition systems. In the described embodiment, 

fork=>l, . . . ,M \ This describes asetofpoint-by-pointimage . *? r 1 u j * . j i. r 

, . v , t . \. u • , l however, the false recognition may be detected because of 

multiplications and summations, operations which can be , ... A „ . r ^ A . J . , , 

performed at approximately frame rate on current image 25 significant distance between the image and the subspace 

■ . / of expected face images, 

processing hardware. ~ r t . b . c , , r 

t-l - , . c . A j f> t *u 4 To summarize, the eigenfaces approach to face recogni- 

Tt>e weights form a vector Q -[Q l£ 2 2 . . .0„] hat tioninvolvesthestepso f (1)eoUec ^ gasetofcharacter ? tic 

describes the contribution of each eigenface in representing - . ~, • j- -j , % , iL 

... t , - t . u • * u - .* f ace images of known individuals; (2) calculating the matrix 

the input face image, treating the eigenfaces as a basis set for t % j- • . j 

£ • 30 L, (3) finding the corresponding eigenvectors and 

"wthrf Lnce to FIGS. 1 and 6, the Eigen head template *&™?™> / eleclin 8 the W eigenvectors with the 

stage 164 can include a database of the eigenfaces created by hl S h f . associa e d f "genvalues; (5) oombinmg the normal- 

4l _ f r - u m.- • * r u ized training set of images according to Eq. 7 to produce the 

the foregoing Eigen approach. This information can be . , A & - . r & ,*>\ c L 7 • j- j 1 

■ .1 . H ■ « l. j* * * reduced set of eigenfaces u- (6) for each known individual, 

received by the compression stage 36 or by the discrimina- „ , . , , & . ™ v ' . e ' 

4 . 4 ,„ rp, r • 7 r li 35 calculate the class vector by averaging the eigenface 

tion stage 38. The compression stage 36 preferably commu- t ^ i i * j t *u •• i- £4 . 

* * *u a * u * * ? » a • \u - pattern vectors Q calculated from the original images of the 

nicates with the database of eigenfaces stored in the eigen- f . . iL , _ & , , & „ 

... , 4 . 1 ca tv • p r i<>-> ♦ a individual; (7) selecting a threshold 8 £ which defines the 

head template stage 164. The eye information 122 outputted . '\ 7 ° „ € - . , /ov 

, . „ r . a , . . . , . . maximum allowable distance from any face class; and (8) 

by the first eye find stage 120 is projected by the compres- . , Q , . . , c 4l _ * „ ' . , v . ' 

. 4 -/ . . . & j, J / c az - * thresholding 0, which defines the maximum allowable dis- 

sion stage 36 into eigenspace and a new set of coefficients . - % 1 

& . , , a i • a < 40 tance f rom f ace space, 

is generated that correspond to a weighted sum of the eigen „ , % 4 , A . n , , . A . 2 

. 6 . . , j . . u>t 6 & For each new face to be identified, calculate its pattern 

templates stored in the stage 164. vectQr ^ each and 

The discnmmation stage 38 compares the coefficients 4 ' - T ' f A . ,. A n . ' 

.. 4 . r . j «: • * distance e to face space. If the distance e>8,, classify the 

corresponding to the new image with a pre-stored coeificient 4 * Tf . . . A' 

i *u u u * j f -r . u mput image as not a face. If the minimum distance e^ = 0^ 

value, or threshold, to determine if a match occurs. K. t . .? 4 ^_ , ...... lf 4l _ K . ,. € 

o • ^ . , . . , , 45 and the distance c=v u classify the input face as the mdi- 

Specifically, the foregoing vector Q is used in a standard . . . . , - * ' J t X « lt _ 

pattern recognition algorithm to find which of a number of ? d , ual ^^T* cl ff v ,f or °«- If ^ 7™ 

pre-defined facial feamre classes, if any, best describes the ^f,^ e = 9 " hen U ^ f ,m /« e . m ^ b 'f*™f «> 

, j ■ ™ . , ' *t_ j r j * • • as unknown , and optionally used to begin a new face class, 

newly acquired image. The simplest method for determining ^ . ' * J „ , ^ e . n . . f 4l _ 

, . l r i j T . , . . r . A FIG. 12 is a schematic flow-chart illustration of the 

which face class provides the best description of an input . . 4 . tl _ . .77 ' , 

r ■ t a j f . i 4 ... so discrimination or thresholding which occurs when the sys- 

face image is to find the face class k that minimizes the . - A ^ t . * A & , iL ... 3 , 

Euclidean distance attempts to determine whether a match has occurred. 

Specifically, as set forth in step 405, the system stores the 

e K HI(C2-QK)|p, eigen coefficients in a selected memory location, such as the 

eigen template stage 164. After the compression stage 36 

where Ok is a vector describing the kth face class. The face 55 calculates or determines the new coefficients corresponding 

classes Q ( . are calculated by averaging the results of the to the newly acquired image or ROI, the system 20 searches 

eigenface representation over a small number of face images the eigen database for a match, step 410. The system then 

(as few as one) of each individual. A face is classified as determines whether the newly acquired face/facial feature is 

belonging to class k when the minimum ck is below some in the database, as set forth in step 415. This searching and 

chosen threshold <t> € . Otherwise the face is classified as 60 matching is performed by comparing the eigenvalues of the 

"unknown", and optionally used to create a new face class new face with a threshold value. If the new face Is greater 

or the system can deny the person access to the secured than the threshold value, then the system signifies a match, 

facility. and the person is allowed access, for example, to a secured 

The Euclidean distance is thus employed to compare two facility, step 420. If no match occurs, then the system 

facial image representations to determine an appropriate 65 reacquires an image and performs the steps and operations 

match, e.g., whether the face belongs to a selected face class described above in connection with system 20, as set forth 

of pre-stored images. Thus the recognition of the newly in step 425. 
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The foregoing system performs a number of operations, 
either singularly or in combination, that enables the 
acquisition, comparison and determination of a facial match 
in real-time, with minimal, if any, intrusion on the person. 
The system furthermore, is computationally efficient and s 
therefore avoids the time and processor intensive applica- 
tions performed by prior art facial recognition systems. 

It will thus be seen that the invention efficiently attains the 
objects set forth above, among those made apparent from the 
preceding description. Since certain changes may be made 10 
in the above constructions without departing from the scope 
of the invention, it is intended that all matter contained in the 
above description or shown in the accompanying drawings 
be interpreted as illustrative and not in a limiting sense. 

It is also to be understood that the following claims are to 15 
cover all generic and specific features of the invention 
described herein, and all statements of the scope of the 
invention which, as a matter of language, might be said to 
fall therebetween. 

Having described the invention, what is claimed as new 20 
and desired to be secured by Letters Patent is: 

1. A system for refining an object within an image based 
on color, the system comprising: 

a storage element for storing flesh tone colors of a 
plurality of people, 25 

defining means for defining an unrefined region of interest 
corresponding to at least part of the object in the image, 
the unrefined region of interest including flesh tone 
colors, 

combination means for combining the unrefined region of 
interest with one or more of the flesh tone colors stored 
in the storage element to refine the region of interest to 
ensure that at least a portion of the image correspond- 
ing to the unrefined region of interest having flesh tone 35 
color is incorporated into the refined region of interest, 
and 

a motion detector for detecting motion of the image 
within a field of view, the motion detector comprises: 
a differencing means for subtracting selected pixel 40 

values associated with generally spatially adjacent 

images and for generating a difference value 

therefrom, and 
a threshold means for comparing the difference value 

with a threshold value to determine motion within 45 

the field of view. 

2. A system in accordance with claim 1, wherein the 
refined region of interest being smaller than or about equal 
to the unrefined region of interest. 

3. A system in accordance with claim 1, further compris- 50 
ing an image acquisition element for acquiring the image. 

4. A system in accordance with claim 1, wherein the 
storage element comprises a look-up -table. 

5. A system in accordance with claim 1, wherein the 
storage element comprises a memory element for storing a 55 
color histogram constructed from the plurality of people, the 
histogram being representative of the distribution of colors 
that constitute flesh tone. 

6. A system in accordance with claim 1, further compris- 
ing blob means for connecting together selected pixels of the 60 
object in the image to form a selected number of blobs 
therein. 

7. A system in accordance with claim 1, further compris- 
ing the detected motion signifying the presence of the object 
within the field of view, and 65 

blob means for connecting together a selected number of 
pixels of the object in the detected image to form a 
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selected number of blobs therein, wherein one of the 
blobs corresponds to the unrefined region of interest. 

8. A system in accordance with claim 7, wherein said 
unrefined region of interest corresponds to a head of a 
person, and the object corresponds to an eye of the person. 

9. A system in accordance with claim 7, wherein the 
combination means is adapted to combine one of the blobs 
with the flesh tone colors to construct the refined region of 
interest. 

10. A system in accordance with claim 1, further com- 
prising 

first histogram means for sampling the flesh tone colors of 
the plurality of people and for generating a first flesh 
tone color histogram, and 

first transform means for transforming the first color 
histogram into ST color space. 

11. A system in accordance with claim 7, further com- 
prising normalizing means for normalizing the first color 
histogram. 

12. A system in accordance with claim 10, farther com- 
prising conversion means for converting the histogram to a 
normalized function according to Bayes Rule. 

13. A system in accordance with claim 10, wherein the 
flesh tone colors correspond to a face of a person within an 
image, the system further comprising second histogram 
means for generating a second color histogram not associ- 
ated with the face within the image. 

14. A system in accordance with claim 13, further com- 
prising second transform means for transforming the second 
color histogram into ST color space. 

15. A system in accordance with claim 1, further com- 
prising 

histogram means for generating a histogram of colors 
corresponding to at least one of a face of a person in the 
image and a non-face portion of the image, and 

transforming means for transforming the histogram into 
ST color space. 

16. A system in accordance with claim 1, further com- 
prising histogram means for generating a histogram of 
colors corresponding to the object in the image. 

17. A system in accordance with claim 1, further com- 
prising means for adjusting the flesh tone colors stored 
within the storage element. 

18. A system in accordance with claim 1, wherein the 
object corresponds to a face of a person within the image, the 
system further comprising erosion means for applying an 
erosion operation to the face to separate pixels correspond- 
ing to hair from pixels corresponding to face. 

19. A system in accordance with claim 1, further com- 
prising erosion means for applying an erosion operation to 
reduce the size of an object within the image, thereby 
reducing the size of the unrefined region of interest. 

20. A system in accordance with claim 18, further com- 
prising dilation means to expand one of the region of 
interests to obtain the object within the image. 

21. A method of refining an object within an image based 
on color, the method comprising the steps of: 

storing flesh tone colors of a plurality of people in a 
storage element, 

defining an unrefined region of interest corresponding to 
at least part of the object in the image, the unrefined 
region of interest including flesh tone colors, 

detecting motion of the image within a field of view, 

connecting together a selected number of pixels of the 
object in the detected image to form a selected number 
of blobs therein, wherein one of the blobs corresponds 
to the unrefined region of interest, and 
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combining one of the blobs that corresponds to the region of interest with one or more of the flesh tone 

unrefined region of interest with one or more of the colors to refine the region of interest to ensure at least 

flesh tone colors stored in the storage element to refine a portion of the image corresponding to the unrefined 

the region of interest to ensure that at least a portion of region of interest that includes flesh tone color are 

the image corresponding to the unrefined region of 5 incorporated into the refined region of interest, the 

interest having flesh tone color is incorporated into the refined region of interest being smaller than or about 

refined region of interest, the refined region of interest equal to the unrefined region of interest, said refined 

being smaller than or about equal to the unrefined region of interest corresponding at least in part to the 

region of interest. object, and 

22. A method in accordance with claim 21, further com- 10 a recognition module for determining whether the 
prising the step of generating a color histogram from the acquired object matches a pre-stored object, 
plurality of people, the histogram being representative of the whereby said system recognizes the object when the 
distribution of colors that constitute flesh tone. object matches the pre-stored object. 

23. A method in accordance with claim 21, wherein the 33. A facial recognition system in accordance with claim 
step of detecting motion comprises the steps of 15 32, further comprising a detection module for detecting a 

subtracting selected pixel values associated with gener- feature of the object. 

ally spatially adjacent images captured by the image 34. A facial recognition system in accordance with claim 

acquisition element and for generating a difference 32 » wherein said detection module comprises: 

value therefrom, and a motion detector for detecting motion of the image 

comparing the difference value with a threshold value to 20 a field of ^ and 

determine whether motion is detected within the field bl °o means for connecting together a selected number of 

of view. pixels of the object in the detected image to form the 

24. A method in accordance with claim 21, further com- selected number of blobs therein. 

prising the steps of 35. A facial recognition system in accordance with claim 

sampling the flesh tone colors of the plurality of people, 25 34 ' whe ; ein s , aid f letecto 1 r module father comprises selector 

. „ , , . i- . means for selecting at least a portion of the object m the 

generating a first flesh tone color histogram from the image 

sampling of flesh tone colors, and 36. A facial recognition system in accordance with claim 

transforming the first color histogram into ST color space. 3^ wherein said detection module comprises 

25. A method in accordance with claim 24, further com- 30 means for generatmg one or more eig enheads correspond- 
pnsing the step of normalizing the first color histogram mg tQ a ^ of eigenvectors generated from a reference 
according to Bayes Rule ^ of im ^ a multi-dimensional image space, 

26. A method in accordance with claim 21, further com- fof ^ ad tQ a ^ ^ 
prising the step of generating a color histogram of the object ^ of ^ ^ m J* ^ and 

in the image. 35 . - . . , , . 

27. A method in accordance with claim 21, further com- comparison means for comparing the eigenhead to he 
prising the steps of refined region of interest corresponding to the object m 

. . , . „ the image to determine whether there is a match. 

generating a color histogram of the object in the image, 3? A facia , recognition system ^ accordance ^ claim 

36, wherein said eigenhead is a low resolution eigenhead. 

transforming the second color histogram into ST color d0 38. A facial recognition system in accordance with claim 

s P ace * 32, further comprising location means for locating a feature 

28. A method in accordance with claim 21, further com- G f me object. 

prising the steps of 39. A facial recognition system in accordance with claim 

generating a histogram of colors corresponding to at least 38, wherein said location means comprises 

one of a face of a person in the image and a non-face means for representing a feature of the region of interest 
portion of the image, and m tne 0D j ect as a plurality of eigenvectors in a multi- 
transforming the histogram into ST color space, dimensional image space, and 

29. A method in accordance with claim 21, further com- correlation means for correlating the region of interest 
prising the step of adjusting the flesh tone colors stored 5Q with the eigenvectors to locate the feature within the 
within the storage element. object. 

30. A method in accordance with claim 21, further com- 40. A facial recognition system in accordance with claim 
prising the step of applying an erosion operation to a face 39, further comprising 

within the image to separate pixels corresponding to hair a compression module for generating a set of eigenvectors 

from pixels corresponding to the face. $$ of a training set of people in the multi-dimensional 

31. A method in accordance with claim 21, further com- image space and 

prising the step of applying a dilation operation to expand proje ction means for projecting the feature onto the 

one of the region of interests to obtain the face. mulu-dimensional image space to generate a weighted 

32. A facial recognition and identification system for vec{Qr ^ represents the feature . 

identifying an object in an image, comprising: 6Q 41 A fadal recognit i on system m accordance ^ claim 

an image acquisition element for acquiring the image, 40^ further comprising discrimination means for comparing 

defining means for defining an unrefined region of interest the weighted vector corresponding to the feature with a 

corresponding to at least part of the object in the image, pre-stored vector to determine whether there is a match. 

the unrefined region of interest including flesh tone 42. A facial recognition system in accordance with claim 

colors, 65 34, wherein the motion detector comprises 

combination means for combining one of a selected differencing means for subtracting selected pixel values 

number of blobs that corresponds to the unrefined associated with generally spatially adjacent images 
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captured by the image acquisition element and for 50. A facial recognition system in accordance with claim 

generating a difference value therefrom, and 32, further comprising dilation means for applying a dilation 

threshold means for comparing the difference value with operation to expand one of the region of interests to obtain 

a threshold value to determine motion within the field a face image corresponding to the object within the image, 

of view. 5 51, a facial recognition system in accordance with claim 

43. A facial recognition system in accordance with claim 39, wherein said correlation means further comprises 

32, further comprising means for storing a center-weighted windowing function, 

first histogram means for sampling the flesh tone colors of means for ]ad ^ fmctkm over the rc ^ on 

the plurality of people and for generating a first flesh Q ^ mtere gt 

tone color histogram, and 10 ' 

H. fln{ .f nm fnr fT ._ P _. fl . , means for analyzing the region of interest with the win- 

nrst transform means for transforming the first color , . - . , , tl _ - 

histogram into ST color space. dowm S to locate me feature of the ob J ect - 

44. A facial recognition system in accordance with claim facial .«cr*mti<m system in accordance with claim 
43, further comprising normalizing means for normalizing _ 39 ' wherem said correlation means further comprises means 
the first color histogram according to Bayes Rule. for analyzing the region of interest with a center-weighted 

45. A facial recognition system in accordance with claim windowing function to locate the feature of the object. 
43, wherein the flesh tone colors correspond to a face of a 53 - A facial recognition system in accordance with claim 
person within an image, the system further comprising ^8, further comprising second location means for determin- 

second histogram means for generating a second color 20 mg ^ C location of ^ feature of ^ object when the first 

histogram not associated with the face within the location module is unable to locate the feature, 

image, and 5< *. A facial recognition system in accordance with claim 

second transform means for transforming the second color 53 > wherein said 866011(1 location means comprises 

histogram into ST color space. means for representing the region of interest in the object 

46. A facial recognition system in accordance with claim 25 as a plurality of eigenvectors in a multi-dimensional 
32, further comprising image space, and 

histogram means for generating a histogram of flesh tone correlation means for correlating the region of interest 

colors corresponding to at least one of a face of a with the eigenvectors. 

person in the image and a non-face portion of the 55, A facial recognition system in accordance with claim 

image, and 30 54^ wherein said second location module further comprises 

transform means for transforming the histogram into ST second means for representing a feature of the region of 

color space. interest in the object as a plurality of eigenvectors in a 

47. A facial recognition system in accordance with claim multi-dimensional image space, and 
32, further comprising histogram means for generating a - . _ 

histogram of flesh tone colors corresponding to the object in 35 COTre l atl ° n means correlating the region of interest 

the image w eigenvectors to locate the feature in the image. 

48. A facial recognition system in accordance with claim 56 * A facial ^cognition system in accordance with claim 
32, wherein the object corresponds to a face of a person 32 » further comprising means for adjusting one of the 
within the image, the system further comprising erosion contrast and brightness of the image. 

means for applying an erosion operation to the face to 40 57. A facial recognition system in accordance with claim 

separate pixels corresponding to hair from pixels corre- 32, further comprising means for correlating the image with 

sponding to face. a windowing function to generate a correlation map. 

49. A facial recognition system in accordance with claim 58. A facial recognition system in accordance with claim 
32, further comprising erosion means for applying an ero- 32, further comprising means for determining a standard 
sion operation to the image to reduce the size of an object 45 deviation and a mean of pixels that constitute the image, 
within the image, thereby reducing the size of the unrefined 

region of interest. ***** 
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